Data Clustering Based on Hybrid of Fuzzy and Swarm Intelligence Algorithm Using Euclidean and Non-euclidean Distance Metrics: a Comparative Study
نویسنده
چکیده
Data mining is a collection of techniques used to extract useful information from large data bases. Data clustering is a popular data mining technique. It is the task of grouping a set of objects into classes such that similar objects are placed in the same cluster while dissimilar objects are in separate clusters. Fuzzy cmeans (FCM) is one of the most popular clustering algorithms. However, it has some limitations such as sensitivity to initialization and getting struck at local optimal values. Swarm intelligence algorithms are global optimization techniques and are recently successfully applied to solve many real-world optimization problems. Constriction Factor Particle Swarm Optimization (cfPSO) algorithm is a population based global optimization technique which is used to solve data clustering problems. Euclidean distance is a well known and commonly used metric in most of the literature. Some drawbacks of this distance metric include blind to correlated variables, not robust in noisy environment, affected by outlier data points and handle data sets with only equal size, density and spherical shapes. But real-world data sets may exhibit different shapes. In this paper, a Fuzzy based Constriction Factor PSO (FUZZY-cfPSO-FCM) algorithm is proposed using Non-Euclidean distance metrics such as Kernel, Mahalanobis and New distance on several benchmark UCI machine learning repository data sets. The proposed hybrid algorithm makes use of the advantages of FCM and cfPSO algorithms. The clustering results are also evaluated through fitness value, accuracy rate and failure rate. Experimental results show that proposed hybrid algorithm achieves better result on various data sets.
منابع مشابه
A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملStock Price Prediction using Machine Learning and Swarm Intelligence
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...
متن کاملDistance Based Hybrid Approach for Cluster Analysis Using Variants of K-means and Evolutionary Algorithm
Clustering is a process of grouping same objects into a specified number of clusters. K-means and Kmedoids algorithms are the most popular partitional clustering techniques for large data sets. However, they are sensitive to random selection of initial centroids and are fall into local optimal solution. K-means++ algorithm has good convergence rate than other algorithms. Distance metric is used...
متن کاملAssessment of the Log-Euclidean Metric Performance in Diffusion Tensor Image Segmentation
Introduction: Appropriate definition of the distance measure between diffusion tensors has a deep impact on Diffusion Tensor Image (DTI) segmentation results. The geodesic metric is the best distance measure since it yields high-quality segmentation results. However, the important problem with the geodesic metric is a high computational cost of the algorithms based on it. The main goal of this ...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کامل